Missing Data Prediction and Classification: The Use of Auto-Associative Neural Networks and Optimization Algorithms

نویسندگان

  • Collins Leke
  • Bhekisipho Twala
  • Tshilidzi Marwala
چکیده

This paper presents methods which are aimed at finding approximations to missing data in a dataset by using optimization algorithms to optimize the network parameters after which prediction and classification tasks can be performed. The optimization methods that are considered are genetic algorithm (GA), simulated annealing (SA), particle swarm optimization (PSO), random forest (RF) and negative selection (NS) and these methods are individually used in combination with autoassociative neural networks (AANN) for missing data estimation and the results obtained are compared. The methods suggested use the optimization algorithms to minimize an error function derived from training the auto-associative neural network during which the interrelationships between the inputs and the outputs are obtained and stored in the weights connecting the different layers of the network. The error function is expressed as the square of the difference between the actual observations and predicted values from an auto-associative neural network. In the event of missing data, all the values of the actual observations are not known hence, the error function is decomposed to depend on the known and unknown variable values. Multi-layer perceptron (MLP) neural network is employed to train the neural networks using the scaled conjugate gradient (SCG) method. Prediction accuracy is determined by mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), and correlation coefficient (r) computations. Accuracy in classification is obtained by plotting ROC curves and calculating the areas under these. Analysis of results depicts that the approach using RF with AANN produces the most accurate predictions and classifications while on the other end of the scale is the approach which entails using NS with AANN. Keywords—Missing Data, Auto-Associative Neural Network, Multi-Layer Perceptron, Genetic Algorithm, Simulated Annealing, Particle Swarm Optimization, Random Forest, Negative Selection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Self-Reconstructing Algorithm for Single and Multiple-Sensor Fault Isolation Based on Auto-Associative Neural Networks

Recently different approaches have been developed in the field of sensor fault diagnostics based on Auto-Associative Neural Network (AANN). In this paper we present a novel algorithm called Self reconstructing Auto-Associative Neural Network (S-AANN) which is able to detect and isolate single faulty sensor via reconstruction. We have also extended the algorithm to be applicable in multiple faul...

متن کامل

Comparison of Genetic and Hill Climbing Algorithms to Improve an Artificial Neural Networks Model for Water Consumption Prediction

No unique method has been so far specified for determining the number of neurons in hidden layers of Multi-Layer Perceptron (MLP) neural networks used for prediction. The present research is intended to optimize the number of neurons using two meta-heuristic procedures namely genetic and hill climbing algorithms. The data used in the present research for prediction are consumption data of water...

متن کامل

On the use of back propagation and radial basis function neural networks in surface roughness prediction

Various artificial neural networks types are examined and compared for the prediction of surface roughness in manufacturing technology. The aim of the study is to evaluate different kinds of neural networks and observe their performance and applicability on the same problem. More specifically, feed-forward artificial neural networks are trained with three different back propagation algorithms, ...

متن کامل

Prediction of pore facies using GMDH-type neural networks: a case study from the South Pars gas field, Persian Gulf basin

The current study proposes a two-step approach for pore facies characterization in the carbonate reservoirs with an example from the Kangan and Dalanformations in the South Pars gas field. In the first step, pore facies were determined based on Mercury Injection Capillary Pressure (MICP) data incorporation with the Hierarchical Clustering Analysis (HCA) method. In the next step, polynomial meta...

متن کامل

Estimating Missing Data Using Neural Network Techniques, Principal Component Analysis and Genetic Algorithms

The common problem of missing data in databases is being dealt with, in recent years, through estimation methods. Auto-associative neural networks combined with genetic algorithms have proved to be a successful approach to missing data imputation. Similarly, two new auto-associative models are developed to be used along with the Genetic Algorithm to estimate missing data and these approaches ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1403.5488  شماره 

صفحات  -

تاریخ انتشار 2014